MRI radiomics enhances radiologists’ ability for characterizing intestinal fibrosis in patients with Crohn’s disease

Objectives We aimed to develop MRI-based radiomic models (RMs) to improve the diagnostic accuracy of radiologists in characterizing intestinal fibrosis in patients with Crohn’s disease (CD). Methods This retrospective study included patients with refractory CD who underwent MR before surgery from November 2013 to September 2021. Resected bowel segments were histologically classified as none-mild or moderate-severe fibrosis. RMs based on different MR sequence combinations (RM1: T2WI and enhanced-T1WI; RM2: T2WI, enhanced-T1WI, diffusion-weighted imaging [DWI], and apparent diffusion coefficient [ADC]); RM3: T2WI, enhanced-T1WI, DWI, ADC, and magnetization transfer MRI [MTI]), were developed and validated in an independent test cohort. The RMs’ diagnostic performance was compared to that of visual interpretation using identical sequences and a clinical model. Results The final population included 123 patients (81 men, 42 women; mean age: 30.26 ± 7.98 years; training cohort, n = 93; test cohort, n = 30). The area under the receiver operating characteristic curve (AUC) of RM1, RM2, and RM3 was 0.86 (p = 0.001), 0.88 (p = 0.001), and 0.93 (p = 0.02), respectively. The decision curve analysis confirmed a progressive improvement in the diagnostic performance of three RMs with the addition of more specific sequences. All RMs performance surpassed the visual interpretation based on the same MR sequences (visual model 1, AUC = 0.65, p = 0.56; visual model 2, AUC = 0.63, p = 0.04; visual model 3, AUC = 0.77, p = 0.002), as well as the clinical model composed of C-reactive protein and erythrocyte sedimentation rate (AUC = 0.60, p = 0.13). Conclusions The RMs, utilizing various combinations of conventional, DWI and MTI sequences, significantly enhance radiologists’ ability to accurately characterize intestinal fibrosis in patients with CD. Critical relevance statement The utilization of MRI-based RMs significantly enhances the diagnostic accuracy of radiologists in characterizing intestinal fibrosis. Key Points MRI-based RMs can characterize CD intestinal fibrosis using conventional, diffusion, and MTI sequences. The RMs achieved AUCs of 0.86–0.93 for assessing fibrosis grade. MRI-radiomics outperformed visual interpretation for grading CD intestinal fibrosis. Graphical Abstract


Introduction
Fibrostenosis is a severe complication of Crohn's disease (CD) that significantly impacts patients' quality of life.Currently, intestinal fibrosis is recognized as a dynamic and reversible condition rather than a static and irreversible entity [1][2][3][4].None-mild fibrosis may be reversed by medical treatment [4].Unfortunately, there are no effective medical interventions for moderate-severe intestinal fibrosis, which necessitates endoscopic or surgical treatment [1][2][3][4].Accurate detection and grading of intestinal fibrosis severity is crucial for selecting appropriate treatments and may facilitate the development of novel antifibrotic therapies.
As transmural fibrosis cannot be detected through endoscopic visualization or partial-thickness biopsy, cross-sectional imaging serves as a non-invasive means of evaluating fibrostenosis [5].However, conventional imaging remains inadequate or yields inconsistent results in the evaluation of intestinal fibrosis [6][7][8].
While magnetization transfer magnetic resonance imaging (MTI) has demonstrated potential in accurately evaluating intestinal fibrosis [7,9], concerns persist regarding interobserver variability and operator subjectivity.The full benefits of MTI for diagnosing fibrosis cannot be realized by relying solely on a single parameter and conventional delineations with limited areas and regions of interest.To date, there is no consensus on the standard for evaluating intestinal fibrosis in CD.
In recent years, radiomics analysis has increasingly been utilized in medical images to enhance the assessment of CD [10][11][12].Radiomics extracts large amounts of high-dimensional features from images to uncover disease characteristics that may not be discernible through visual inspection or single-parameter analysis [13,14].In a previous study [15], a novel computed tomography enterography (CTE) radiomic model (RM) was developed and validated allowing for accurate characterization of intestinal fibrosis in CD.However, because CTE has the potential risk of radiation, magnetic resonance enterography (MRE) is the preferred follow-up examination for patients with CD.Moreover, multi-sequence MRE performs better than CTE in the assessment of intestinal fibrosis [5][6][7].Tabari et al reported that texture analysis of contrast-enhanced imaging can detect bowel fibrosis in CD [10].However, this study only included small sample size (n = 25).A recent study has explored the assessment of intestinal fibrosis in a mouse model through textural analysis of T2WI [11].To the best of our knowledge, no existing study has investigated the efficacy of multi-sequence MRE-based radiomic analysis for evaluating CD intestinal fibrosis in humans.
This study aims to develop and verify MRE-based RMs for the diagnosis of intestinal fibrosis, with the objective of improving radiologists' diagnostic accuracy and efficiency in characterizing this condition.

Patients
From November 2013 to September 2021, we included 128 consecutive eligible patients with CD from our institution.This retrospective study was approved by the institutional ethics review board, which waived the requirement for obtaining informed consent.
The inclusion criteria were as follows: (a) patients diagnosed with CD based on standard clinical, imaging, endoscopic, and histological investigations [16]; (b) availability of preoperative MRE within three months of surgery for bowel strictures, fistula, or abscess; (c) availability of a histopathologic bowel segment corresponding to a matching affected intestine on MRE.The exclusion criteria were as follows: (a) inadequate imaging quality resulting from moving artefacts; (b) targeted bowel segment presenting with other bowel diseases, or located at an anastomosis; and (c) unidentifiable intestinal contour on MRE due to severe pericentric effusion, intestinal adhesion, or intestinal peristalsis.
The final enroled patients were randomly allocated into training and test cohorts in an approximate ratio of 3:1 for developing and validating the RMs, respectively.

Surgical histopathology evaluation of intestinal fibrosis
We adopted a region-to-region positioning approach [8] to achieve accurate location matching of bowel segments between MRE and surgical specimens and histological sections (Supplementary Materials).One bowel segment was obtained for each patient.
Histological sections were collectively assessed by two pathologists (Z.Y. and Q.C., with 11-12 years of experience in bowel pathology and no access to clinical and radiological information) to reach a consensus.Scores 0-2 were considered none-mild, and scores 3-4 were considered moderate to severe (Supplementary materials).The ratio of none-mild to moderate-severe histopathology fibrotic samples in both the training and test cohorts remained consistent, ensuring that the diagnostic performance of the models was unbiased and robust.

Bowel segmentation
We have chosen T2WI, contrast-enhanced T1WI in the venous phase, DWI, ADC, and MTI for radiomics analysis.[6,17].Three RMs established by different sequence combinations (RM1: T2WI and enhanced-T1WI; RM2: T2WI, enhanced-T1WI, DWI, and ADC; RM3: T2WI, enhanced-T1WI, DWI, ADC, and MTI) were applicable to institutions equipped with varying MRI device conditions across different regions and levels of development (Fig. 1).Three-dimensional volumes of interest (VOIs) were manually segmented on the bowel lesions in each MRE sequence in the training and test cohorts by a gastrointestinal radiologist (M.Z.) with seven years of experience, using MITK (version 2018.04;https://www.mitk.org/).The VOIs for the entire region were delineated along the lesion contour on each sequence's images, excluding the intestinal lumen.These completed VOIs served as masks to select voxels within the lesion.

Features extraction
We extracted the MRE radiomic features using the pyradiomics toolkit (https://pyradiomics.readthedocs.io/en/latest/installation.html).For reproducibility tests, two gastrointestinal radiologists (M.Z. with seven years of experience and Z.F. with three years) annotated additional VOIs on bowel segments in the training and test cohorts.
The radiologist (M.Z.) conducted an intra-observer reproducibility test with a three-month interval between readings.

Features selection and RM development
Radiomic features were selected by recursive feature elimination (RFE) [18], with optimized parameters, including the weight of the feature importance determined by 10-fold cross-validation.Based on the selected radiomic features, a binary classification RM using the support vector machines (SVM) [19] classifier was built to distinguish between none-mild and moderate-severe intestinal fibrosis (supplementary materials).The workflow of the radiomics analysis is shown in Fig. 1.

RM validation
RMs were validated using a completely independent test cohort.The diagnostic performance was evaluated using the area under the receiver operating characteristic curve (AUC) and calibrated using the Hosmer-Lemeshow goodness-of-fit test.Integrated discrimination improvement (IDI) and net reclassification improvement (NRI) were used to test for performance improvement among different RMs.
The impact of factors including the severity of inflammation (none-mild or moderate-severe inflammation), lesion locations (small bowel or colon), different MRI scanners (Magnetom Trio or Prisma; Siemens Healthineers), and bowel lesions with and without penetrating diseases on RM diagnostic performance was investigated.

Development and validation of visual interpretation
The criteria for visual interpretation of intestinal fibrosis, as modified by previous studies [5,20,21], were established based on imaging features or parameters derived from T2WI, enhanced T1WI, DWI, ADC, and MTI (Fig. 2).
Visual interpretation was independently performed by a gastrointestinal radiologist with 11 years of experience (X.L.), who were blinded to pathological and radiomics data.For criteria 1-4 in Fig. 2, the radiologist documented the interpretation of each imaging feature as either '0' for absence or '1' for presence.As for criteria 5-6 in Fig. 2, the corresponding parameter measurements were recorded.These visual interpretation indexes (Fig. 2) were utilized as features for SVM modelling, and this calculation step was identical to that of the RMs.Similarly, three visual models were constructed by SVM utilizing these visual features from identical sequence combinations (visual model 1: T2WI and enhanced-T1WI; visual model 2: T2WI, enhanced-T1WI, DWI, and ADC); visual model 3: T2WI, enhanced-T1WI, DWI, ADC, and MTI) as the three RMs.We employed ROC analyses to assess their performance.For reproducibility testing, two radiologists with 11 (X.L.) and 3 years' (Y.W.) of experience, respectively, conducted visual interpretation, and their inter-observer variability was calculated.

Development and validation of clinical model
Data on clinical features (i.e.age, sex, course of disease, C-reactive protein (CPR), erythrocyte sedimentation rate (ESR), smoking, body mass index, previous surgery history, and Montreal classification for CD) within one week before MRE were retrieved from the electronic medical record system (Table 2).A clinical model was constructed based on the above candidate factors in the training cohort using SVM.The relevance of each clinical feature to the reference standard was assessed.A cross-validation was performed to train the classification model.

Statistical analysis
The sample size was calculated by the ROC statistical method.The parameters were set based on the following input and assumption: power, 80%; two-sided significance level, 0.05; alternative hypothesis of the AUC, 0.80 compared with the null hypothesis of the AUC, 0.50; and an allocation ratio of sample sizes in none-mild and moderate-severe fibrosis of 1:5 [22].Therefore, sample sizes of 66 (11 of none-mild, 55 of moderate-severe fibrosis) in the training cohort and 23 (four of nonemild,19 of moderate-severe fibrosis) in the test cohort were sufficient to detect an AUC different from 0.50, with 80% power if the true AUC was > 0.80.Normally and non-normally distributed data are expressed as the mean ± standard deviation and medians (interquartile ranges), respectively.Either the student's t-test or Welch's t-test was used to calculate the differences among different models and histological scores.Kendall correlation was used to analyze the visual interpretation and histological scores.The DeLong method [23,24] was used to compare ROC among different models.The clinical utility of the models was measured by decision curve analysis.Statistical analyses were performed using Python and SPSS (version 20; https://www.ibm.com/cn-zh/analytics/spssstatistics-software).A two-sided p < 0.05 was considered significant, except for univariate analysis (p < 0.10).

Demographics
The final included population comprised 123 patients (81 men, 42 women; mean age: 30.26 ± 7.98 years) (Fig. 3).Among them, 93 and 30 patients were selected as the training and test cohorts, respectively.The bowel segments all displayed fibrotic strictures with varying degrees of fibrosis, ranging from mild to severe (Supplementary Table 2).The demographic and clinical characteristics are presented in Table 2.

RMs performance Radiomic features selection and observer reproducibility
Fifty fibrosis-related features were selected by RFE from each MRE sequence (details in Supplementary materials); these showed robust inter-and intra-observer reproducibility with intraclass correlation coefficients (ICCs) of 0.969-0.998(all p < 0.001) and 0.977-0.985(all p < 0.001), respectively.The top five most important features and the contribution of each MRE sequence in per RM are shown in Fig. 4. The average durations of bowel segmentation, feature extraction, and selection for RM1, RM2, and RM3 were 58 s, 126 s, and 164 s, respectively for each patient.
RM3 had the highest diagnostic efficacy, and its predicted probability for each intestinal lesion is shown in Fig. 2 MRE criteria of visual interpretation intestinal fibrosis in CD.DWI, diffusion-weighted MRI; ADC, apparent diffusion coefficient; MTR, magnetization transfer ratio Fig. 5E, F.Moreover, the Hosmer-Lemeshow test indicated that RM3 was effective and robust with a x 2 of 8.541 (p = 0.32) and 12.140 (p = 0.58) in the training and test cohorts, respectively (Fig. 5G).

The diagnostic performance of RMs within different subgroups
In both training and test cohorts, there were no significant differences in the performance of RMs for diagnosing fibrosis between different inflammatory bowel segments, strictures in small or colonic, different MRI scanners (Magnetom Trio or Prisma; Siemens Healthineers), and bowel segments with or without penetrating diseases (all p > 0.05) (Supplementary Tables 4-7), indicating the high reliability and stability of these three RMs for diagnosing bowel fibrosis.

Visual interpretation performance
The performance of visual models 1 and 2 in distinguishing degrees of intestinal fibrosis was found to be poor, both in the training (AUC = 0.52, 0.61, respectively) and test cohort (AUC = 0.65, 0.63, respectively) (all p > 0.05).The inclusion of MTI measurements improved the diagnostic performance of visual model 3 both in the training (AUC = 0.77, p < 0.001) and test cohorts (AUC = 0.77, p = 0.002) (Fig. 5A, B).
The inter-observer agreement assessment for MRE features showed poor results for 'T2WI iso-/hypo-intensity',

Clinical model performance
The clinical model was constructed using SVM based on univariate analysis, selecting CPR and ESR from the nine clinical features.However, this clinical model failed to differentiate between the different degrees of intestinal fibrosis in both the training (AUC = 0.52, p = 0.12) and test (AUC = 0.60, p = 0.13) cohorts.

Comparing the diagnostic performance of the RMs with those of the clinical model and visual' interpretation
In the training and test cohorts, the RMs all performed significantly better than the corresponding visual interpretation and the clinical model in evaluating intestinal fibrosis (DeLong's test, all p < 0.05) (Fig. 5A, B).Decision curve analysis showed that the RMs provided a better net benefit for predicting intestinal fibrosis than the clinical model and visual interpretation in the training (Fig. 5C) and test (Fig. 5D) cohorts.

Discussion
To address the lack of non-invasive and non-irradiating tools for precisely detecting transmural fibrosis, we   MRE, with a multi-sequence combination, could reflect various histopathological features of CD intestinal fibrosis.The combination of MTI and T2WI has been reported as a viable method for assessing bowel fibrosis [9].However, conventional image analysis of MRE typically evaluates bowel features in only one image slice or using a limited ROI, assuming the homogeneity of bowel features over the entire image.This may explain why traditional imaging interpretation remains insufficient for detecting intestinal fibrosis accurately or produces inconsistent results in its evaluation [6][7][8][9].The application of radiomic analysis in MRE offers a novel perspective for the characterization of intestinal fibrosis in CD, providing more objective and reproducible transmural measurements.
We proposed three RMs with different combinations of sequences, taking into account the technical feasibility of their implementation and the acceptable scanning time for patients with CD.RM1 with only basic MR sequences had already achieved moderate diagnostic efficacy, demonstrating the potential of radiomics in digging fibrosis-related information hidden in MRE images.This finding is consistent with previous reports on texture analysis using conventional sequences [10,11].Therefore, RM1 is a great option for basic healthcare hospitals that do not perform DWI and MTI scans due to the extended scan time or unavailability of the sequences.Encouragingly, after adding DWI and ADC to RM1 to construct RM2, the diagnostic accuracy improved considerably.This positive result could be because DWI with ADC can reflect the change in extracellular water molecule diffusion that is affected by collagen deposition within the fibrotic bowel segment [25,26].As expected, RM3, which combines the MTI, diffusion, and conventional sequences, had the highest diagnostic efficiency among the three RMs.A previous animal study reported both T2WI texture analysis and MTI can detect established bowel fibrosis [11].MTI enables the indirect quantification of macromolecule concentrations in tissues, reflecting the severity of collagen deposition within the bowel wall regardless of coexisting inflammation [7,9].Applying radiomics to the analysis of MTI images further increases its potential for characterizing intestinal fibrosis.Thus, RM3 exhibited the best diagnostic performance.
Our RMs were developed using radiomic features with good interobserver consistency and robust classification performance in both the training and test cohorts.Furthermore, their efficacy was stable and unaffected by factors such as concomitant inflammation severity.Since inflammation and fibrosis often co-exist within the same intestinal segment, it is important that RMs assess the degree of fibrosis independent of inflammation.Moreover, RMs are unaffected by differences in intestinal location, and the presence of penetrating lesions.No significant differences in the performance of the RMs were observed when utilizing two distinct MRI scanners.
The three RMs demonstrated superior efficacy in grading intestinal fibrosis compared to the visual models using traditional imaging interpretation of corresponding MR sequences.This suggests that RMs can enhance the ability of radiologists to assess fibrosis.This result was consistent with our previous CTE-based radiomic study [15], as it was easier for computer algorithms to accurately identify the details of intestinal fibrosis.On the other hand, while visual models based on conventional and/or diffusion sequences showed poor performance in evaluating fibrosis, incorporating MTI improved their efficacy.This may be attributed to the superior ability of MTI in assessing fibrosis compared to conventional sequences and DWI, which aligned with previous research findings [7,9].Additionally, the poor agreement between observers' interpretation of T2WI and enhanced-T1WI features may be ascribed to variations in their levels of expertise.Similarly, the RMs were superior to the clinical model in diagnosing fibrosis.The fact that this clinical model targeted the whole patient rather than the specific inflamed intestine is the major reason for its inefficiency.
Our study has several limitations.First, the requirement for resected specimens may have resulted in selection bias toward enroling more patients with advanced CD and relatively fewer patients with none-mild fibrosis for analysis.Second, precise point-by-point correlations between MRE and surgical specimens were difficult to obtain due to bowel peristalsis.With a relatively short interval between MRE and surgery along with pre-scanning hypotonic bowel preparation, we achieved region-byregion correlations between MRE and specimens based on the identification of anatomical structure or gross lesion, as described in a prior study [7].Lastly, our study had a relatively limited sample size without including multicenter data.The need for a standardized MRE examination, especially with qualified MTI, hindered the inclusion of eligible centers and samples.However, to our best knowledge, this study included the largest sample size reported to date in the field of MRE-based radiomics analysis of bowel fibrosis in CD.These promising results from our center suggest the necessity and feasibility of future prospective multicenter studies.
In conclusion, multi-sequence MRE-based RMs are a valuable tool for accurately characterizing intestinal fibrosis in CD and can significantly enhance radiologists' diagnostic ability.Our three RMs, constructed using different combinations of conventional, DWI and MTI sequences, meet diverse technical feasibility requirements for implementation in various institutions for patients with CD.

Fig. 1 A
Fig. 1 A The radiomics analysis workflow: RM1, RM2, and RM3 established by different sequence combinations aid in the evaluation of CD fibrosis.B A 41-year-old man with CD with moderate-severe fibrotic stricture in descending colon.Radiomics features extracted from different sequence combinations of T2WI, contrast-enhanced T1WI, DWI, ADC, and MTI are used to construct RM1-RM3.To generate feature maps, the features are calculated for each voxel in the segmented VOIs with a kernel radius of 3, and assigned to the center pixel.ADC, apparent diffusion coefficient; CD, Crohn's disease; DWI, diffusion-weighted imaging; MTI, magnetization transfer MRI; RM, radiomic model

Fig. 5
Fig. 5 The ROC curves (A, B), decision curves (C, D), predicted probabilities (E, F), and calibration curves (G) of the models.The diagnostic performance of the RMs is better than that of both the visual interpretations and clinical model (all p < 0.001 according to DeLong's test) in the training (A) and test (B) cohorts.RMs have higher net benefits than the two visual interpretation and clinical models in the training (C) and test (D) cohorts.For the RM3, plots show predicted probabilities with a cut-off value of 0.360 (indicated by the black solid line) in the training (E) and test (F) cohorts, and the calibration curves (G) in both cohorts.RM, radiomic model; ROC, receiver operating characteristic; AUC, area under the receiver operating characteristic curve

Table 1
MRI sequences and parameters

Table 2
Demographic and clinical characteristics of patients with CD in the training and test cohorts BMI body mass index, CD Crohn's disease, CRP C-reactive protein, ESR erythrocyte sedimentation rate, IQR interquartile range, SD standard deviation a Comparison between the training and total test cohorts 'transmural enhancement', and 'fibrofatty proliferation' with ICCs of 0.331 (p < 0.001), 0.114 (p = 0.10), and 0.275 (p < 0.001), respectively, while the assessment for other features including 'increased intensity on DWI', ADC, MT-ratio, and normalized MT-ratio had good agreement with ICCs of 0.772, 0.768, 0.855, and 0.824, respectively (all p < 0 .001).

Table 3
Diagnostic performance of the RMs, radiologist's visual models and clinical models for detection of moderate to severe fibrosis in the training and test cohorts Data in parentheses are numerators and denominators RMs and visual models were based on different MR sequence combinations: RM1 or visual model 1 (T2WI and contrast-enhanced T1WI), RM2 or visual model 2 (T2WI, contrast-enhanced T1WI, and diffusion-weighted imaging [DWI], apparent diffusion coefficient [ADC]), RM3 or visual model 3 (T2WI, contrast-enhanced T1WI, DWI, ADC, and magnetization transfer MRI) AUC area under the receiver operating characteristic curve, CI confidence interval, NPV negative predictive value, PPV positive predictive value, RM radiomic model a Number of resected bowel segments